37 research outputs found

    A Weighted Maximum Entropy Language Model for Text Classification

    Get PDF
    Abstract. The Maximum entropy (ME) approach has been extensively used for various natural language processing tasks, such as language modeling, part-of-speech tagging, text segmentation and text classification. Previous work in text classification has been done using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method to apply Maximum Entropy modeling for text classification in a different way it has been used so far, using weights for both to select the features of the model and to emphasize the importance of each one of them in the classification task. Using the X square test to assess the contribution of each candidate feature from the obtained X square values we rank the features and the most prevalent of them, those which are ranked with the higher X square scores, they are used as the selected features of the model. Instead of using Maximum Entropy modeling in the classical way, we use the X square values to weight the features of the model and give thus a different importance to each one of them. The method has been evaluated on Reuters-21578 dataset for test classification tasks, giving very promising results and performing comparable to some of the "state of the art" systems in the classification field

    EXTENDED SPEECH EMOTION RECOGNITION AND PREDICTION

    Get PDF
    Humans are considered to reason and act rationally and that is believed to be their fundamental difference from the rest of the living entities. Furthermore, modern approaches in the science of psychology underline that humans as a thinking creatures are also sentimental and emotional organisms. There are fifteen universal extended emotions plus neutral emotion: hot anger, cold anger, panic, fear, anxiety, despair, sadness, elation, happiness, interest, boredom, shame, pride, disgust, contempt and neutral position. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that one uses during a common conversation. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. The paper deals with emotion classification by a set of majority voting classifiers that combines three certain types of basic classifiers with low computational complexity. The basic classifiers stem from different theoretical background in order to avoid bias and redundancy which gives the proposed set of classifiers the ability to generalize in the emotion domain space

    Text Classification: Forming Candidate Key-Phrases from Existing Shorter Ones

    No full text
    Abstract: The hard problem of the Text Classification usually has various aspects and potential solutions. In this paper, two main research directions for narrative documents’ classification are considered. The first one is based on data mining and rule induction techniques, while the second combines the traditional Text Retrieval techniques (use of the vector space model, index terms, and similarity measures), Natural Language Processing and Instance based Learning techniques. Key-phrases can be used as attributes for mining rules or as a basis for measuring the similarity of new (unclassified) documents with existing (classified) ones. Hence, we eventually focus on the problem of extracting key-phrases from text’s collection in order to use them as attributes for text classification. A new algorithm for the discovery of key-phrases is described. Candidate key-phrases are built using frequent smaller ones and special emphasis is given to the reduction of the complexity of the algorithm

    Using WordNet Lexical Database and Internet to Disambiguate Word Senses

    No full text
    Abstract. The term “knowledge acquisition bottleneck ” has been used in Word Sense Disambiguation Tasks (WSDTs) to illustrate/express the problem of the lack of large tagged corpora. In this paper, an automated WSDT is based on text corpora extracted / collected from Internet web pages. First, the disambiguation for the sense of a word, in a context, is based on the use of its definition and the definitions of its direct hyponyms in the WordNet to form queries for searching the Internet. Then, the “sense-related examples”, in other words the collected answers / information, are used to disambiguate the word’s sense in the context. A (similarity) metric is used to calculate the similarity between the context and the “sense-related examples ” and the word is assigned the sense of the most similar example with the context. Some experiments are briefly described and the evaluation of the proposed method is discussed.
    corecore